Skip to content

Conversation

@nielsbauman
Copy link
Contributor

This method would default to starting a new node when the cluster was empty. This is pretty trappy as getClient() (or things like getMaster() that depend on getClient()) don't look at all like something that would start a new node.

In any case, the intention of tests is much clearer when they explicitly define a cluster configuration.

This method would default to starting a new node when the cluster was
empty. This is pretty trappy as `getClient()` (or things like
`getMaster()` that depend on `getClient()`) don't look at all like
something that would start a new node.

In any case, the intention of tests is much clearer when they explicitly
define a cluster configuration.
@nielsbauman nielsbauman added >test Issues or PRs that are addressing/adding tests v9.1.0 labels Apr 24, 2025
@elasticsearchmachine elasticsearchmachine added serverless-linked Added by automation, don't add manually needs:triage Requires assignment of a team area label labels Apr 24, 2025
Copy link
Contributor Author

@nielsbauman nielsbauman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't really come up with a great way to split this PR up. If it's too much to review, any suggestions for splitting it up are welcome.

* An integration test that verifies how different paths/scenarios affect the APM metrics for failure stores.
*/
@ESIntegTestCase.ClusterScope(numDataNodes = 0, numClientNodes = 0, scope = ESIntegTestCase.Scope.SUITE)
@ESIntegTestCase.ClusterScope(scope = ESIntegTestCase.Scope.TEST, numDataNodes = 1, numClientNodes = 0, supportsDedicatedMasters = false)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tests depend on there being exactly one node in the cluster, which I solved like this.


import java.util.Collection;

@ESIntegTestCase.ClusterScope(scope = ESIntegTestCase.Scope.TEST, numDataNodes = 0, numClientNodes = 0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tests didn't actually need any specific cluster configuration.

}

public void testClusterRestartWithLicense() throws Exception {
wipeAllLicenses();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wiping the licenses before the test starts doesn't make sense if we start with an empty cluster (i.e. no nodes).

@nielsbauman nielsbauman added Team:Distributed Coordination Meta label for Distributed Coordination team :Distributed Coordination/Distributed A catch all label for anything in the Distributed Coordination area. Please avoid if you can. and removed needs:triage Requires assignment of a team area label labels Apr 24, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM nice work

@nielsbauman nielsbauman enabled auto-merge (squash) April 25, 2025 07:06
@nielsbauman nielsbauman merged commit c72d00f into elastic:main Apr 25, 2025
16 of 17 checks passed
@nielsbauman nielsbauman deleted the fix-get-client branch April 25, 2025 08:07
slobodanadamovic added a commit that referenced this pull request Jun 24, 2025
The #127318 changed the behaviour of `client()` to not start a node if there 
is none found in the cluster. Which also changed the `getMasterName()` 
behaviour to simply fail if there are no nodes instead of starting one.

This is why the `getMasterName()` is failing now. There were no nodes 
started because the test scope is set to manually manage master nodes 
(`autoManageMasterNodes = false`) without data nodes (`numDataNodes = 0`).

The fix is to actually start the master node instead of attempting to get 
the master node name from an empty cluster and depend on a side effect 
to actually boostrap a node.

Additionally it awaits for the master node to process all cluster state events 
before proceeding, which should hopefully solve the original cause of failures.

Resolves #120964
Resolves #120923
slobodanadamovic added a commit to slobodanadamovic/elasticsearch that referenced this pull request Jun 24, 2025
The elastic#127318 changed the behaviour of `client()` to not start a node if there
is none found in the cluster. Which also changed the `getMasterName()`
behaviour to simply fail if there are no nodes instead of starting one.

This is why the `getMasterName()` is failing now. There were no nodes
started because the test scope is set to manually manage master nodes
(`autoManageMasterNodes = false`) without data nodes (`numDataNodes = 0`).

The fix is to actually start the master node instead of attempting to get
the master node name from an empty cluster and depend on a side effect
to actually boostrap a node.

Additionally it awaits for the master node to process all cluster state events
before proceeding, which should hopefully solve the original cause of failures.

Resolves elastic#120964
Resolves elastic#120923

(cherry picked from commit ee6d64a)

# Conflicts:
#	muted-tests.yml
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jun 25, 2025
The elastic#127318 changed the behaviour of `client()` to not start a node if there 
is none found in the cluster. Which also changed the `getMasterName()` 
behaviour to simply fail if there are no nodes instead of starting one.

This is why the `getMasterName()` is failing now. There were no nodes 
started because the test scope is set to manually manage master nodes 
(`autoManageMasterNodes = false`) without data nodes (`numDataNodes = 0`).

The fix is to actually start the master node instead of attempting to get 
the master node name from an empty cluster and depend on a side effect 
to actually boostrap a node.

Additionally it awaits for the master node to process all cluster state events 
before proceeding, which should hopefully solve the original cause of failures.

Resolves elastic#120964
Resolves elastic#120923
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Distributed A catch all label for anything in the Distributed Coordination area. Please avoid if you can. serverless-linked Added by automation, don't add manually Team:Distributed Coordination Meta label for Distributed Coordination team >test Issues or PRs that are addressing/adding tests v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants